-
Notifications
You must be signed in to change notification settings - Fork 68
[Draft] Support the globaltimer and smid on Intel Arch #4816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for globaltimer and smid functions on Intel Architecture (XPU) by implementing Intel-specific inline assembly code and updating corresponding unit tests.
- Replaces CUDA PTX assembly with Intel inline assembly for globaltimer and smid functions
- Adds bitwise operations to extract subslice ID from status register for smid implementation
- Updates unit tests to support both CUDA and Intel XPU backends with appropriate assembly verification
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
third_party/intel/language/intel/utils.py | Implements Intel-specific inline assembly for globaltimer and smid functions |
python/test/unit/language/test_core.py | Updates tests to support both CUDA and Intel XPU with backend-specific assertions |
return core.inline_asm_elementwise( | ||
"""{\n .decl globaltimer v_type=G type=ud num_elts=2 align=qword alias=<$0, 0> \n""" | ||
""" mov (M1_NM, 2) globaltimer(0, 0)<1> %tsc(0,0)<1;1,0> \n}""", "=rw.u", [], dtype=core.uint64, is_pure=False, | ||
pack=1, _semantic=_semantic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The inline assembly string is complex and split across multiple lines with embedded newlines and escape sequences. Consider extracting this assembly code to a constant or using a more readable multiline string format to improve maintainability.
return core.inline_asm_elementwise( | |
"""{\n .decl globaltimer v_type=G type=ud num_elts=2 align=qword alias=<$0, 0> \n""" | |
""" mov (M1_NM, 2) globaltimer(0, 0)<1> %tsc(0,0)<1;1,0> \n}""", "=rw.u", [], dtype=core.uint64, is_pure=False, | |
pack=1, _semantic=_semantic) | |
GLOBALTIMER_ASM = ( | |
"{\n" | |
" .decl globaltimer v_type=G type=ud num_elts=2 align=qword alias=<$0, 0> \n" | |
" mov (M1_NM, 2) globaltimer(0, 0)<1> %tsc(0,0)<1;1,0> \n" | |
"}" | |
) | |
return core.inline_asm_elementwise( | |
GLOBALTIMER_ASM, "=rw.u", [], dtype=core.uint64, is_pure=False, pack=1, _semantic=_semantic | |
) |
Copilot uses AI. Check for mistakes.
_semantic=_semantic) | ||
sr = core.inline_asm_elementwise("mov (M1_NM, 1) $0(0, 0)<1> %sr0(0,0)<0;1,0>", "=rw.u", [], dtype=core.uint32, | ||
is_pure=True, pack=1, _semantic=_semantic) | ||
pos: core.constexpr = core.constexpr(9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 9 should be documented or extracted to a named constant to explain what bit position it represents in the status register.
pos: core.constexpr = core.constexpr(9) | |
pos: core.constexpr = core.constexpr(STATUS_REGISTER_BIT_POSITION) |
Copilot uses AI. Check for mistakes.
sr = core.inline_asm_elementwise("mov (M1_NM, 1) $0(0, 0)<1> %sr0(0,0)<0;1,0>", "=rw.u", [], dtype=core.uint32, | ||
is_pure=True, pack=1, _semantic=_semantic) | ||
pos: core.constexpr = core.constexpr(9) | ||
subslice_mask: core.constexpr = core.constexpr((1 << 11) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 11 should be documented or extracted to a named constant to explain the bit width of the subslice mask.
subslice_mask: core.constexpr = core.constexpr((1 << 11) - 1) | |
SUBSLICE_MASK_BIT_WIDTH = 11 # Bit width of the subslice mask | |
subslice_mask: core.constexpr = core.constexpr((1 << SUBSLICE_MASK_BIT_WIDTH) - 1) |
Copilot uses AI. Check for mistakes.
Signed-off-by: Lu,Chengjun <[email protected]> inline asm for smid and global timer.
Support the globaltimer and smid on Intel Arch.